09. Quiz: Expected Sarsa
Quiz: Expected Sarsa
Say that an agent is learning to navigate the gridworld described earlier in the lesson.
data:image/s3,"s3://crabby-images/ce887/ce887406977fdea3e1021aa3e4fdc2cc60692c92" alt="Gridworld Example"
Gridworld Example
Suppose the agent is using Expected Sarsa in its search for the optimal policy, with \alpha=0.1.
At the end of the 99th episode, the Q-table has the following values:
data:image/s3,"s3://crabby-images/75838/75838f3ac5a9088f8e0eea8e4f9f753a55fb7844" alt="Q-table"
Q-table
Say that at the beginning of the 100th episode, the agent starts in state 1 and selects action right. As a result, it receives reward -1, and the next state is state 2.
data:image/s3,"s3://crabby-images/2c237/2c237a58b12bace39554e15821f2bf48f54845af" alt="Beginning of the 100th episode"
Beginning of the 100th episode
In the previous video, you learned that at this point in time, the agent updates the Q-table.